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During the project period work has been focused on the following three 
ar_»as: 

i. estimating the stratum's crop acreage proportion using the multiyear 
area estimation model, 

ii. assessment of multiyear sampling designs, and 

iii. development of statistical methodology for incorporating partially 

identified sample segments into crop area estimation. 

Although each of these areas is reviewed separately below, the overall goal 

of improved crop area estimation utilizes all three areas jointly. 

* 

Our objectives in this project have been more than met. We have develooed 
and documented the statistical methodology needed to utilize the multiyear area 
estimation model to produce a good estimate of the stratum's crop area proportion 
based upon current and previous years' estimated crop area proportions 
in sample segments. By assessing the impact on the stratum's crop area estimate, 
we have derived recommendations for how the sample of segments should vary from 
year to year . Finally, we have determined and tested procedures for explicitly 
utilizing only partially identified acreages as well as sample segments with 
completely identified crop acreages. 

The three aspects of our research under this project have been separately 
documented ir our technical reports 20, 21, and 22. Dr. R. L. Sielken Jr. gave 
two invited presentations on our research at the Joint Statistical Meetings of 
the American Statistical Association and The Biometric Society in Cincinnati, 

Ohio, during August 16-19, 1982. These presentations were entitled "Multiyear, 
Through-the-Season Crop Acreage Estimation Using Estimated Acreage in Sample 
Segments" and "Incorporating Partially Classified Sample Segments Into NASA 
Acreage Estimation Procedures". Dr. E. E. Gbur also gave a presentation at the 
same national meetings entitled "Rotation Sampling Designs" which reported on our 
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research concerning multiyear sampling. All three of these presentations 
are being published in the Proceedings of the Section on Survey Research 
Methods. 


1. Estimating the Stratum's Crop Acreage Proportion 
Using the Multiyear Area Estimation Model 


The basic model relating the stratum's at harvest crop acreage to the 
crop's estimated at harvest acreage in the sample segments has the general 
form 

y(observation) = year effect + segment effect + season bias + noise (1) 
where y ( * ) is an appropriate transformation. The specific form of model (1) 
is 




+ b. 


+ e 4 


tsJt 


where 


t ~ 1,..., T , 

s - 1 , . . . , S, (2) 

l = 1 ,..., L 


p ts£ = the estl ' mated proportion of the s-th segment's acreage that will 
contain the crop at harvest time in the t-th year when the 
estimate is made at crop calendar time i (for example, 1 = 1 
could denote early season, 1=2 mid-season, and 1=3 at harvest 
time) ; 

y (P ts£ ) s a variate transformation of p^; 

cc^ * the stratum's transformed crop acreage proportion for the t-th 
year; 

b s * the s-th sampled segment's departure from the stratum's transformed 
crop acreage proportion; the b s 's are random variables with 
expectations zero and variance 



- 4 - 


ORIGINAL PAGE 18 
OF POOP QUALITY 

o £ s the systematic difference between the non-harvest time estimates 
of the crop's transformed at harvest acreage proportion and the 
corresponding estimate made at harvest time (6 L = 0); 

e ts£ = a 99 r egate of sampling and classifications errors in the 
transformed data. 

The primary objective is to estimate the crop's at harvest proportion 
of the stratum acreage in the current year, T; that is, estimate the inverse 
transformation of Oj denoted by P T = y 1 (oy). Secondary objectives could 
be improved estimates of at harvest acreages in previous years or estimates 
of changes in the stratum's crop at harvest acreage proportion from year to 
year. 

Estimates of the stratum's crop at harvest acreage proportion are often 
needed throughout the current year as well as at harvest time. For example, 
an early season estimate based on observations for £=!,..., L for t=l,..., 

T-l and only £=1 for t=T is often desired. 

A A 

Of course, even though the estimate P T-*- l W>- of the stratum's crop 

A A 

at harvest acreage proportion for the current year involves only Oy, the ou 
depends on the entire multiyear data set and the estimates of the segment 
effects and the systematic biases which are assumed to be constant from year 
to year. 

The simplest transformation, y(p), of the estimated segment crop acreage 
proportion, p, to use in (2) is the identity transformation 

y(p) * p • 

However, it is very doubtful that the additive model (2) would hold for 
y(p)=p particularly if the p's exhibit a large variation within the stratum. 
On-the-other-hand a multiplicative model for p may be more reasonable and 
a logarithmic transformation, y(p) = £n{p), more appropriate. The logit 
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y(p) = (1/2) £n[p/(1-p)], 

is another useful transformation which approximately converts a multiplicative 
model for p into an additive model for y(p). All three of the above 
transformations are considered in Technical Report No. 20. There approximate 
expressions are derived for 

(i ) the variance of y(p) , 

A A 

(ii) the bias of Py = y -1 (^j), 


A 

(iii) the mean squared error of Py = y" 1 (oj), and 

(iv) confidence intervals for Py 

under the assumption that p arises from a binomial random variable. 

When estimating the parameters (c^ , b , c ; ) in model (2), it is not 
particularly reasonable to assume that the variance of y(p^ is the same 
for all t,s,£. Hence a weighted least squares analysis procedure has been 
derived as opposed to the usual unweighted least squares procedure. 

A self-contained computer implementation of the weighted least squares 
estimation procedure has been given to Lockeed, NASA, and ERIM. 

Research is continuing under a new contract on several related issues. 

A 

The sensitivity of the estimate, Py = y -1 {ay), of the stratum's at harvest 
crop acreage proportion to such things as the transformation used, the 
accuracy of the weights, and the reliability of the estimate of y = oZ/cr 

D £ 

is under study. The empirical behavior of the approximate expressions for 

A A 

the bias of Py and the mean squared error of P T as well as the approximate 
confidence intervals on Py is also being evaluated. The extension of the 
basic model (1) tc include year-segment interactions and segment-season 
interactions is being considered. Another possibility is to replace the 
seasonal bias term in (1) by a covariate in terms of something like the 
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number of "crop calendar days" passed by the date of the last satellite 
imagery used in determining the estimated segment at harvest crop acreage 
proportion. 

Another important line of research concerns the nature of the weights 
themselves. If the true segment at harvest crop acreage proportion were 
p* and the estimated p's were binomial in nature, then the variance of a 
segment estimate p would be proportional to p*(l-p*). Furthermore, the 
variance of y(p) could be derived for a given y, and the appropriate weight 
in the weighted least '.quares procedure could be straight-forwardly 
approximated using the estimated p. However, the variance of the estimate 
of the segment's at harvest crop acreage proportion may not be binomial in 
nature but rather depend on such things as 

(i) the satellite being used, 

(ii) the sharpness of the satellite imagery, 

(iii) the amount of satellite imagery available at the time of the 
segment estimate. 

(iv) the nearness of the segment's observed behavior to classical 
crop profiles, 

(v) the season during which the estimate is being made, 

(vi) the weather conditions during the crop's growing season, 

(vii) the composition of the segment, etc. 

The derivation of appropriate weights under this latter scenario is being 
investigated. 
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2. Assessment of Multiyear Sampling Designs 

In general, we have a population of segments which is to be sampled for T 
consecutive years. In any proposed sampling design, the units to be sampled 
can change from year to year but not at time points within the year. In 
addition, there is a positive correlation between the responses from a 
segment in consecutive years which can be utilized to reduce the standard 
errors of the estimators of the end of year means. The problem is to 
determine a T year sampling scheme which is optimal in some sense. 

In assessing possible multiyear sampling designs we assumed that the 
eventual estimation woui' 1 be based upon the multiyear model (2) discussed in 
the preceding section. Our conclusions were derived from analytical results 
for particular situations and from exhaustive enumeration of all possible 
sampling designs for T= 2,3, L = 2,3, R s 2,3,4,5, and y * .25, .5, 1.0, 2.0, 

4.0 (where y is tne ratio of the variation between segments to the variation 
between observations on the same segment due to measurement error). Extensive 
simulations were aHo performed to determine the distributional characteristics 
of the estimators under different sampling designs. 

Technical Reports 18, 19, and 22 describ? for two and three year sampling 
designs the behavior of the estimator of the stratum's at-harvest crop acreage 
proportion in the last year of the design. Technical Report 18 obtains a 
numerical efficiency for each two or three year sampling design for the case 
where all segment observations have the same variance and hence the weighted 
least squares estimator becomes simply a least squares estimator. Technical 
Report 19 generalizes these results by considering the case where the variances 
of the observations are not necessarily all equal. Here the more efficient 
sampling designs from Technical Report 18 were compared in terms of the 
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distributions of the corresponding stratum crop acreage proportion estimators. 
Finally, in Technical Report 22 these more efficient sampling designs were 
compared in terms of the distributions of the corresponding stratum crop 
acreage proportion estimators when cloud cover, etc. caused a random occurrence 
of missing segment observations. This last study most closely reflects 
reality. Specific sampling design recommendations are made In the individual 
technical reports and are not recounted here. 

In the paper "Rotation Sampling Designs" Gbur and Sielken discuss two 
optimality criteria for sampling designs which depart from the criterion 
considered in Technical Reports 18, 19, and 22. One of these criteria reflects 
the desire to minimize the average variance of the at-harvest crop acreage 
proportion estimator where the average is taken over all years instead of 
just the last year. The second criterion reflects the desire to minimize 
the v?riance of linear combinations of at-harvest crop acreage proportion 
estimators over more than one year - for example, a desire to minimize the 
variance of the estimated change in the stratum's at-harvest crop 
acreage proportion from one year to the next. These two criteria do not 
necessarily lead to the same "optimal" designs nor do they always lead to 
the same "optimal" designs discussed in Technical Reports 18, 19, and 22. 

During our assessments of sampling designs it has been observed that for 
almost any good two year design there is an extension of that design to a 
third year which is at least a near-optimal three year design. This naturally 
suggests a sequential approach to constructing the sampling design. In our 
new contract we will undertake to develop and implement computer software 
capable of sequentially constructing next year's design utilizing the 
specified sample size for that year (possibly different from preceding years) 
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and utilizing the crop acreage proportion Information gathered thusfor as 
well as the Information on which sample segment observations were missing. 

3. Utilizing Partially Identified Sample Segments 

A small sample of segments within a large region Is selected. Each 
sample segment Is observed via satellite at several different times during 
the crop growing seasons. The objective Is to estimate for each crop of 
Interest the proportion of the region's acreage corresponding to that crop's 
harvested acreage. 

In Technical Report 21 we assume that there are only two crops of Interest. 
Furthermore, we assume that only data from the current growing year are to be 
used in estimating the crop at harvest proportions. The cases where more than 
two crops are of interest and/or data is available from more than one growing 
year will be considered in future research. 

The sample segments are all assumed to be of the same size. No assumption 
Is made about the region size or the segment size. The sampled segments are 
assumed to represent n random sample (without replacement) from the segments 
In the region. 

Each sample segment is assumed to have been observed at least once during 
the growing year and possibly several times. The two crops of interest are 
designated as crop A and crop B. When a sample segment Is observed, the 
observation can have the form (p A , p g , p Qther ) where 

P A * the estimated proportion of the segment which will be harvested in 
crop A, 

P B ■ the estimated proportion of the segment which will be harvested in 
crop B, and 

p other " 1 " P A * P B " the est<lmate<1 proportion of the segment which will 
not be harvested in either A or B. 
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Alternatively, estimates may not be made on A and B separately but onl; on 
A and B collectively, so that the observation can have the form ( p^ + g , 

w> ***'•• 

p A+8 " * he proportion of the segment that will be harvested 

In either A or B, and 

^other * ^ " P A*^B * estimated proportion of the segment that will 
not be harvested In A or B. 

The most recent segment estimates are assumed to reflect any previous 
observations made on that sample segment during the current growing year. 

If a sample segment's current observation Is of the form (p A+B , Pother^* 
then the sample segment Is said to be partially classified or partially 
Identified. If Its observation Is of the form (p A , p g , P other )» then It will 
be called completely identified. 

The proportion of the region harvested In crop A will be denoted P A 
with Pg similarly defined. The objective is to estimate P^ and Pg using 
the observations on the sample segments. This estimation may have to be 
made at more than one time during the growing year. Of course, .f there 
are no completely Identified sample segments, only the sum P A + Pg can be 
estimated on the basis of the sample segments. 

Four alternative estimators of the region's at-harvest crop a*, «age 
proportions are derived In Technical Report 21 

(1) maximum likelihood estimators, 

(2) least squares estimators, 

(3) weighted least squares estimators, and 

(4) a combination of a least squares estimator of the relative 
proportion of crop A out of crops A and B together and a maximum 
likelihood estimator of the at-harve«t combined acreage proportion 
of crops A and B together. 
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The true test of an estimator's value 1% Its performance on real data. 

Hence a Monte Carlo study of the performance of the four estimation 
procedures was carried out based upon two real sets of r AMS data. 

There were several possible ways to measure the sample behavior of the 

* * A A 

estimators. For eat . estimator and each of P A , Pg, and 1 - P^ - Pg the following 
measures were calculated for each data set: 

(1) average absolute e^ror * the average over 1000 simulations of 

l p * p reg , on l wher * P region represents the * ctul1 cn > p 
proportion In the particular simulated region. 

(11) average squared error * the average over 1000 simulations of 
< P - P r^1on) 2 

(ill) bias of average estimate * the difference between the average 
P In 1000 simulations and P $et where P $et Is the actual crop 
proportion In the entire set of segments, and 
(iv) sample variance of the estimator. 

Some Information is, of course, lost when some segments are only partially 
Identified. To assess this loss, the maximum likelihood estimators were also 
calculated using the complete Identification for all sampled segments. Since 
these estimators utilize complete Information for the entire sample of n segments 
Instead of complete Information on only some of the n segments and partial 
information on the remainder, these latter estimators perform better. 

To show that tne Inclusion of the partially identified segments Into the 
estimation procedure is better than simply Ignoring them, the maximum likell- 
hood estimators, least squares estimators, and weighted least squares estimators 
were also calculated using only the subset of the n sample segments corresponding 
to the completely identified segments. 

In the Monte Carlo Study the CAMS estimates of the segment's crop acreage 
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proportions were simulated as If they contained no errors. In order to 
ascertain the impact of any such errors, the Monte Carlo study was repeated 
with a normal deviate added to each of the segment's crop acreage proportion 
estimates. 

On the basis of the limited Monte Carlo study and the small follow-up 
Investigation the following conclusions were reached: 

1) As long as there are some completely identified sample segments, 

it is reasonable to estimate the individual crop proportions in the 
region. 

2) It is prudent to avoid having a large percentage (say 80S) of only 
partially Identified sample segments. 

3) It is much better to incorporate the partially identified sample 
sepients into the estimators than it is to disregard the partially 
sample segments. 

4) When there are either no errors or only very small errors in the 
estimates of the segment's crop acreage proportions, the maximum 
likelihood estimators seem to be the best estimators, but they are 
not greatly superior to weighted least squares estimators or the 
use of a least squares ratio estimator. 

5) When there are fairly substantial errors in the estimates of the 
segment's crop acreage proportions, the combination of the least 
squares ratio estimator with the maximum likelihood estimator of the 
combined crop proportion is the superior estimator. 

The overall optimality cf using the combination the least squares ratio 
estimat'*" and the maximum likelihood estimator of the combined crop proportion 
suggests some definite possibilities for further research - which we hope to 
pursue. In particular, by simply treating the combined crops as a single crop. 
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the combined crop proportion can be estimated when there are more than one 
year's data by utilizing the current methodology derived for the multiyear 
model (2) described in section 1. Then this multiyear based combined crop 
proportion can be subdivided into Individual crop proportions using, for 
example, the least squares ratio estimator based on the current year. 

4. Remarks 

The productivity rf this research period has been aided by the support 
and cooperation of many NASA, Lockheed, and ERIM personnel. We look 
forward to our future joint research. 
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